Lipreading using convolutional neural network
نویسندگان
چکیده
In recent automatic speech recognition studies, deep learning architecture applications for acoustic modeling have eclipsed conventional sound features such as Mel-frequency cepstral coefficients. However, for visual speech recognition (VSR) studies, handcrafted visual feature extraction mechanisms are still widely utilized. In this paper, we propose to apply a convolutional neural network (CNN) as a visual feature extraction mechanism for VSR. By training a CNN with images of a speaker’s mouth area in combination with phoneme labels, the CNN acquires multiple convolutional filters, used to extract visual features essential for recognizing phonemes. Further, by modeling the temporal dependencies of the generated phoneme label sequences, a hidden Markov model in our proposed system recognizes multiple isolated words. Our proposed system is evaluated on an audio-visual speech dataset comprising 300 Japanese words with six different speakers. The evaluation results of our isolated word recognition experiment demonstrate that the visual features acquired by the CNN significantly outperform those acquired by conventional dimensionality compression approaches, including principal component analysis.
منابع مشابه
LCANet: End-to-End Lipreading with Cascaded Attention-CTC
Machine lipreading is a special type of automatic speech recognition (ASR) which transcribes human speech by visually interpreting the movement of related face regions including lips, face, and tongue. Recently, deep neural network based lipreading methods show great potential and have exceeded the accuracy of experienced human lipreaders in some benchmark datasets. However, lipreading is still...
متن کاملDouble-Star Detection Using Convolutional Neural Network in Atmospheric Turbulence
In this paper, we investigate the usage of machine learning in the detection and recognition of double stars. To do this, numerous images including one star and double stars are simulated. Then, 100 terms of Zernike expansion with random coefficients are considered as aberrations to impose on the aforementioned images. Also, a telescope with a specific aperture is simulated. In this work, two k...
متن کاملResource aware design of a deep convolutional-recurrent neural network for speech recognition through audio-visual sensor fusion
Today’s Automatic Speech Recognition systems only rely on acoustic signals and often don’t perform well under noisy conditions. Performing multi-modal speech recognition processing acoustic speech signals and lip-reading video simultaneously significantly enhances the performance of such systems, especially in noisy environments. This work presents the design of such an audio-visual system for ...
متن کاملA Convolutional Neural Network based on Adaptive Pooling for Classification of Noisy Images
Convolutional neural network is one of the effective methods for classifying images that performs learning using convolutional, pooling and fully-connected layers. All kinds of noise disrupt the operation of this network. Noise images reduce classification accuracy and increase convolutional neural network training time. Noise is an unwanted signal that destroys the original signal. Noise chang...
متن کاملProvide a Deep Convolutional Neural Network Optimized with Morphological Filters to Map Trees in Urban Environments Using Aerial Imagery
Today, we cannot ignore the role of trees in the quality of human life, so that the earth is inconceivable for humans without the presence of trees. In addition to their natural role, urban trees are also very important in terms of visual beauty. Aerial imagery using unmanned platforms with very high spatial resolution is available today. Convolutional neural networks based deep learning method...
متن کاملCystoscopy Image Classication Using Deep Convolutional Neural Networks
In the past three decades, the use of smart methods in medical diagnostic systems has attractedthe attention of many researchers. However, no smart activity has been provided in the eld ofmedical image processing for diagnosis of bladder cancer through cystoscopy images despite the highprevalence in the world. In this paper, two well-known convolutional neural networks (CNNs) ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014